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Abstract We present, in easily reproducible terms, 
a simple transformation for offline-parsable grammars 
which results in a provably terminating parsing pro- 
gram directly top-down interpretable in Prolog. The 
transformation consists in two steps: (1) removal 
of empty-productions, followed by: (2) left-recursion 
elimination. It is related both to left-corner parsing 
(where the grammar is compiled, rather than inter- 
preted through a parsing program, and with the ad- 
vantage of guaranteed termination in the presence of 
empty productions) and to the Generalized Greibach 
Normal Form for DCGs (with the advantage of imple- 
mentation simplicity). 

1 Motivation 

Definite clause grammars (DCGs) are one of the sim- 
plest and most widely used unification grammar for- 
malisms. They represent a direct augmentation of 
context-free grammars through the use of (term) uni- 
fication (a fact that tends to be masked by their usual 
presentation based on the programming language Pro- 
log). It is obviously important to ask wether certain 
usual methods and algorithms pertaining to CFGs can 
be adapted to DCGs, and this general question informs 
much of the work concerning DCGs, as well as more 
complex unification grammar formalisms (to cite only 
a few areas: Earley parsing, LR parsing, left-corner 
parsing, Greibach Normal Form). 

One essential complication when trying to generalize 
CFG methods to the DCG domain lies in the fact that, 
whereas the parsing problem for CFGs is decidable, 
the corresponding problem for DCGs is in general un- 
decidable. This can be shown easily as a consequence 
of the noteworthy fact that any definite clause pro- 
gram can be viewed as a definite clause grammar "on 
the empty string" , that is, as a DCG where no termi- 
nals other than [ ] are allowed on the right-hand sides 
of rules. The Turing-completeness of definite clause 
programs therefore implies the undecidability of the 
parsing problem for this subclass of DCGs, and a for- 
tiori for DCGs in general.^ In order to guarantee good 

Thanks to Pierre Isabelle and Franois Pcrrault for their com- 
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ration of this paper. 

1 DCGs on the empty string might be dismissed as extreme, 



computational properties for DCGs, it is then neces- 
sary to impose certain restrictions on their form such 
as offline-parsability (OP), a nomenclature introduced 
by Pereira and Warren jll]], who define an OP DCG 
as a grammar whose context-free skeleton CFG is not 
infinitely ambiguous, and show that OP DCGs lead to 
a decidable parsing problem J^] 

Our aim in this paper is to propose a simple trans- 
formation for an arbitrary OP DCG putting it into 
a form which leads to the completeness of the direct 
top-down interpretation by the standard Prolog inter- 
preter: parsing is guaranteed to enumerate all solutions 
to the parsing problem and terminate. The existence 
of such a transformation is known: in |2[, we have 
recently introduced a "Generalized Greibach Normal 
Form" (GGNF) for DCGs, which leads to termination 
of top-down interpretation in the OP case. However, 
the available presentation of the GGNF transforma- 
tion is rather complex (it involves an algebraic study 
of the fixpoints of certain equational systems repre- 
senting grammars.). Our aim here is to present a re- 
lated, but much simpler, transformation, which from a 
theoretical viewpoint performs somewhat less than the 
GGNF transformation (it involves some encoding of 
the initial DCG, which the GGNF does not, and it only 
handles offline-parsable grammars, while the GGNF is 
defined for arbitrary DCGs)j^| but in practice is ex- 
tremely easy to implement and displays a comparable 
behavior when parsing with an OP grammar. 

The transformation consists of two steps: (1) empty- 
production elimination and (2) left-recursion elimina- 
tion. 

The empty-production elimination algorithm is in- 
spired by the usual procedure for context-free gram- 
mars. But there are some notable differences, due 
to the fact that removal of empty-productions is in 
general impossible for non-OP DCGs. The empty- 



but they are in fact at the core of the offline-parsability concept. 
See note tj. 

2 The concept of offline-parsability (under a different name) 
goes back to ^ ] , where it is shown to be linguistically relevant. 

3 The GGNF factorizes an arbitrary DCG into two compo- 
nents: a "unit sub-DCG on the empty string" , and another part 
consisting of rules whose right-hand side starts with a termi- 
nal. The decidability of the DCG depends exclusively on certain 
simple textual properties of the unit sub-DCG. This sub-DCG 
can be eliminated from the GGNF if and only if the DCG is 
offline-parsable. 



production elimination algorithm is guaranteed to ter- 
minate only in the OP case.^ It produces a DCG 
declaratively equivalent to the original grammar. 

The left-recursion elimination algorithm is adapted 
from a transformation proposed in H in the context 
of a certain formalism ("Lexical Grammars") which 
we presented as a possible basis for building reversible 
grammars]^] The key observation (in slightly different 
terms) was that, in a DCG, if a nonterminal g is defined 
literally by the two rules (the first of which is left- 
recursive): 



9(X) 
9(X) 



g(Y), d(Y,X). 
t(X). 



then the replacement of these two rules by the three 
rules (where cLfc is a new nonterminal symbol, which 
represents a kind of "transitive closure" of d): 

g(X)^t(Y), dJc(Y,X). 
dJc(X,X) -*•[]. 

dJc(X, Z) d(X, Y), dJc(Y, Z). 

preserves the declarative semantics of the grammar 

We remarked in Q that this transformation "is 
closely related to left-corner parsing" , but did not give 
details. In a recent paper Mark Johnson introduces 
"a left-corner program transformation for natural lan- 
guage parsing" , which has some similarity to the above 
transformation, but which is applied to definite clause 
programs, rather than to DCGs. He proves that this 
transformation respects declarative equivalence, and 
also shows, using a model-theoretic approach, the close 
connection of his transformation with left-corner pars- 
ing(l|l,0.O 

It must be noted that the left-recursion elimination 
procedure can be applied to any DCG, whether OP or 
not. Even in the case where the grammar is OP, how- 
ever, it will not lead to a terminating parsing algorithm 
unless empty productions have been prealably elimi- 
nated from the grammar, a problem which is shared 
by the usual left-corner parser-interpreter. 

4 The fact that the standard CFG empty-production elimi- 
nation transformation is always possible is related to the fact 
that this transformation does not preserve degrees of ambiguity. 
For instance the infinitely ambiguous grammar S — ► [b] A, A — > 
A, A — > [ ] is simplified into the grammar S — > [6]. This type 
of simplification is generally impossible in a DCG. Consider for 
instance the "grammar" s(X) — > [number] a(X), a(succ(X)) — > 
a(X), a(0)-[]. 

5 The method goes back to a transformation used to compile 
out certain local cases of left-recursion from DCGs in the context 
of the Machine Translation prototype CRITTER [f| . 

6 A proof of this fact, based on a comparison of proof-trees 
for the original and the transformed grammar, is given in O. 

7 His paper does not state termination conditions for the 
transformed program. Such termination conditions would prob- 
ably involve some generalized notion of offiine-parsability |j, ^, 
fUJ. By contrast, we prove termination only for DCGs which are 
OP in the original sense of Pereira and Warren, but this case 
seems to us to represent much of the core issue, and to lead to 
some direct extensions. For instance, the DCG transformation 
proposed here can be directly applied to "guided" programs in 
the sense of W. 



Due to the space available, we do not give here cor- 
rectness proofs for the algorithms presented, but expect 
to publish them in a fuller version of this paper. These 
algorithms have actually been implemented in a slightly 
extended version, where they are also used to decide 
whether the grammar proposed for transformation is 
in fact offline-par sable or not. 

2 Empty-production 
elimination 

It can be proven that, if DCGO is an OP DCG, the 
following transformation, which involves repeated par- 
tial evaluation of rules that rewrite into the empty 
string, terminates after a finite number of steps and 
produces a grammar DCG without empty-productions 
which is equivalent to the initial grammar on non- 
empty strings]^] 

input: an offline-parsable DCG1. 

output: a DCG without empty rules equivalent to DCG1 

on non-empty strings. 

algorithm: 

initialize LIST1 to a list of the rules of DCG1, set LIST2 
to the empty list. 

while there exists an empty rule ER: A(T1, Tk) — > [ 
in LIST1 do: 
move ER to LIST2. 

for each rule R: B(...) — > a in LIST1 such that a 
contains an instance of A(...) (including 
new such rules created inside this loop) do: 
for each such instance A(S1, Sk) unifiable with 
A(Tl,...,Tk) do: 
append to LIST1 a rule R': £>(...) — > a' obtained 
from R by removing A(S1, Sk) 
from a (or by replacing it with [ ] if this was 
the only nonterminal in a), 
and by unifying the Ti's with the Si's. 
set DCG to LIST1. 

For instance the grammar consisting in the nine rules 
appearing above the separation in fig. [[] is transformed 
into the grammar (see figure): 

s(s(NP,VP)) -> np(NP),vp(VP). 
np(np{N, C)) -» n(N), comp{C). 
n(n(people)) — » [people]. 
vp(vp(v(sleep),C)) — > [sleep], comp(C). 
comp(c(C,A)) — > comp(C), adv(A). 
adv(adv(here)) — > [here]. 
adv(adv (today)) — » [today]. 
np(np(n(you)),C) — * comp(C). 
np(np(N , nil)) — > n(N). 
comp(c(nil, A)) — > adv(A). 
vp(vp(v(sleep) ,nil)) — > [sleep]. 
s(s(np(np(n(you)),nil), VP)) — > vp(VP). 



"When DCGO is not OP, the transformation may produce 
an infinite number of rules, but a simple extension of the algo- 
rithm can detect this situation: the transformation stops and 
the grammar is declared not to be OP. 



3 Left-recursion elimination 

The transformation can be logically divided into two 
steps: (1) an encoding of DCG into a "generic" form 
DCG', and (2) a simple replacement of a certain group 
of left-recursive rules in DCG' by a certain equivalent 
non left-recursive group of rules, yielding a top-down 
interpretable DCG" . An example of the transformation 
DCG — ► DCG' — > DCG" is given in fig. § 

The encoding is performed by the following algo- 
rithm: 

input: an ofHine-parsable DCG without empty rules, 
output: an equivalent "encoding" DCG'. 
algorithm: 

initialize LIST to a list of the rules of DCG. 
initialize DCG to the list of rules (literally): 
g(X)^g(Y), d(Y,X). 
g(X) -» t(X). 
while there exists a rule R of the form 
A(T1, Tk) -> B(S1, SI) a in LIST do: 
remove R from LIST, 
add to DCG a rule R': 
d(B(Sl,...,Sl),A(Tl,...,Tk)) -» a.', 
where a' is obtained by replacing any C(V1, Vm) 
in a by g(C(Vl, ...,Vm)), 
or is set to [ ] in the case where a is empty, 
while there exists a rule R of the form 
A(T1, Tk) -> [terminal] a in LIST do: 
remove R from LIST, 
add to DCG a rule R': 
t(A(Tl, ...,Tk)) -> [terminal] a', 
where a' is obtained by replacing any C(V1, Vm) 
inaby g(C(Vl,...,Vm)), 
or is set to [ ] in the case where a is empty. 

The procedure is very simple. It involves the cre- 
ation of a generic nonterminal g{X), of arity one, 
which performs a task equivalent to the original nonter- 
minals s(Xl, . . . , Xn), vp(Xl, . . . , Xm), .... The goal 
g(s(Xl, . . . , Xn)), for instance, plays the same role for 
parsing a sentence as did the goal s(Xl, . . . ,Xn) in 
the original grammar. 

Two further generic nonterminals are introduced: 
t(X) accounts for rules whose right-hand side begins 
with a terminal, while d(Y, X) accounts for rules whose 
right-hand side begins with a nonterminal. The ratio- 
nale behind the encoding is best understood from the 
following examples, where => represents rule rewrit- 
ing: 



comp(C) 

sleep], g(comp(C)) 



vp{vp{v{sleep) , C)) — > [sZeep] 
=> g(vp(vp(v(sleep), C))) — > 
=> ffPO [sleep], 

( {X = vp(vp(v(sleep),C))}, g(comp(C)) ) 

v ' 

t(X) 



s{s(NP,VP)) -> np(NP), vp(VP) 

=> g(s(s(NP, VP))) -> g(n P (NP)), g(vp(VP)) 

=>g(X)^g(Y), 

( {X = s(s(NP, VP)), Y = np(NP)}, g(vp(VP)) ) 

V V ' 

d(Y,X) 

The second example illustrates the role played by 
d(Y, X) in the encoding. This nonterminal has the fol- 
lowing interpretation: X is an "immediate" extension 
of Y using the given rule. In other words, Y corre- 
sponds to an "immediate left-corner" of X . 

The left-recursion elimination is now performed by 
the following "algorithm" Q 

input: a DCG' encoded as above. 

output: an equivalent non left-recursive DCG". 

algorithm: 

initialize DCG" to DCG. 

in DCG", replace literally the rules: 

g(X)^g(Y), d(Y,X). 

g(X)->t(X). 
by the rules: 

g(X)^t(Y), dJc(Y,X). 

dJc(X,X) ->■[]. 

dJc(X,Z) -> d(X,Y), dJc(Y,Z). 

In this transformation, the new nonterminal ddc 
plays the role of a kind of transitive closure of d. It can 
be seen that, relative to DCG", for any string w and 
for any ground term z, the fact that g(z) rewrites 
into w — or, equivalently, that there exists a ground 
term x such that t(x) d_tc(x, z) rewrites into w — 
is equivalent to the existence of a sequence of ground 
terms x = x\, . . . , Xk — z and a sequence of strings 
Wx, u>k such that t(x\) rewrites into w\ , d(xi,X2) 
rewrites into W2, d(xk-i,Xk) rewrites into Wk , and 
such that w is the string concatenation w = w\ ■ ■ - Wk- 
From our previous remark on the meaning of d{Y, X), 
this can be interpreted as saying that "consituent x is 
a left-corner of constituent z" , relatively to string w. 

The grammar DCG" can now be compiled in the 
standard way — via the adjunction of two "differential 
list" arguments — into a Prolog program which can be 
executed directly. If we started from an offiinc-parsable 
grammar DCGO, this program will enumerate all so- 
lutions to the parsing problem and terminate after a 
finite number of steps.0 
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DCG 



DCG' 



s(s(NP, VP)) -> np(NP),vp(VP). 
np{np(N, C)) -> n(N), comp(C). 
n(n(people)) — > [people]. 
vp(vp(v(sleep),C)) — ► [sleep], comp(C). 
comp(c(C,A)) — > comp(C), adv(A). 
adv(adv(here)) — > [Ziere]. 
adv(adv (today)) — > [today]. 
np(np(n(you)) , C) — > comp(C). 
np(np(N,nil)) -» n(iV). 
comp(c(nil, A)) — > adi^A). 
vp(vp(v (sleep), nil)) — ► [sZeep]. 
s(s(np(np(n(you)),nil),VP)) — ► wp(VP). 



ff (X)^ fl (F),d(F,X). 
g(X)->t(X). 

d(np(NP),s(s(NP,VP))) -<■ g(vp(VP)). 
d(n(N),np(np(N,C))) -> g(camp{C)). 
t(n(n(people))) — > [people]. 
t(vp(vp(v (sleep), C))) —> [sleep], g(comp(C)). 
d(comp(C) , comp(c(C , A))) — ► y(adv(A)). 
t(adv(adv(here))) — > [here]. 
t(adv(adv (today))) — ► [today]. 
d(comp(C),np(np(n(you)),C)) — > []. 
d(n(N),np(np(N,nil))) -> [ ]. 
d(adv(A),comp(c(nil, A))) — ► [ ]. 
t(vp(vp(v(sleep),nil))) —> [sleep]. 
d(vp(VP), s(s(np(np(n(you)),nil),VP))) — > []. 

DCG" 

<7(x)^t(y),d.te(F,x). 

dJc(X,X) -> [ ]. 
d_te(X, Z) -> d(X, F), d_te(F, Z). 
d(np(ArP),s(s(iVP,yP))) -> g(vp(V.P)). 
d(n(N),np(np(N,C))) -> g(comp(C)). 
t(n(n(people))) —> [people]. 
t(vp(vp(v(sleep),C))) — > [sZeep], g(comp(C)). 
d(comp(C) , comp(c(C , A))) — ► y(ad«(A)). 
t(adv(adv(here))) — > [here]. 
t(adv(adv (today))) — > [today]. 
d(comp(C),np(np(n(you)),C)) — ► []. 
d(n(N),np(np(N,nil))) -> []. 
d(adv(A),comp(c(nil, A))) — > [ ]. 
t(vp(vp(v(sleep),nil))) — > [sZeep]. 
d(wp(VP), s(s(np(np(n(you)),nil), VP))) — > []. 



Figure 2: Encoding (DCG') of a grammar (DCG) and left-recursion elimination (DCG"). 



